Eels were tracked from the Belgian coast (Nieuwpoort) through the North Sea as they migrate to their spawning grounds in the Atlantic Ocean. See our article for the migration routes. In brief, the routes were modeled through geolocation modelling based on logged data on temperature and depth obtained by data loggers that were externally attached to the eels. For more info on the method, we refer to the aforementioned article. In general, we found that the eels can take 2 routes: a northern route over the UK or a southern route through the English Channel.
Now that we analysed the horizontal movement patterns (i.e. the migration routes), we want to analyse the vertical movement patterns. This can provide information on how eels orient themselves or move in a bio-energetic efficient way. Two of our key questions is if the eels have a circadian pattern in their vertical movement and if they apply selective tidal stream transport (STST). It is known that eels apply a diel vertical migration pattern in the Atlantic Ocean: they dive to depths of ca. 800 m during daytime, but swim at depths of ca. 200 m during night time (see this article). However, it is unknown if eels show this or another pattern on the shallow continental shelf. Considering STST, this mode of transport has been illustrated for eels in estuaries. Since the North Sea and the English Channel have strong prevailing tidal currents, it is plausible that eels apply STST as well and hence behave differently during the ebbing and flooding tide. To gain more insight in the vertical movement pattern, we want to analyse the eel’s swimming depth (depth measurements every 5 minutes) over time in relation to the next relevant explanatory variables, being:
*Water currents are calculated as eastward and northward velocity vectors. Since the eels took 2 routes, I would split up the analysis for eels taking the southern route and eels taking the northern route.
Sys.setenv(TZ='GMT')
Sys.timezone()
## [1] "GMT"
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✔ ggplot2 3.3.6 ✔ purrr 0.3.4
## ✔ tibble 3.1.7 ✔ dplyr 1.0.9
## ✔ tidyr 1.2.0 ✔ stringr 1.4.0
## ✔ readr 2.1.2 ✔ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
library(lubridate)
##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
library(mgcv)
## Loading required package: nlme
##
## Attaching package: 'nlme'
## The following object is masked from 'package:dplyr':
##
## collapse
## This is mgcv 1.8-40. For overview type 'help("mgcv-package")'.
#library(splines) # for bs function
#library(REdaS)
library(car)
## Loading required package: carData
##
## Attaching package: 'car'
## The following object is masked from 'package:dplyr':
##
## recode
## The following object is masked from 'package:purrr':
##
## some
#library(sjPlot)
#library(EnvStats)
library(pracma)
##
## Attaching package: 'pracma'
## The following object is masked from 'package:car':
##
## logit
## The following object is masked from 'package:mgcv':
##
## magic
## The following object is masked from 'package:purrr':
##
## cross
library(nlme) # for gls()
library(gridExtra)
##
## Attaching package: 'gridExtra'
## The following object is masked from 'package:dplyr':
##
## combine
library(forecast) # for auto.arima() function
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
##
## Attaching package: 'forecast'
## The following object is masked from 'package:nlme':
##
## getResponse
library(tseries) # for Augmented Dickey-Fuller Test (adf.test())
library(visreg)
library(itsadug)
## Loading required package: plotfunctions
##
## Attaching package: 'plotfunctions'
## The following object is masked from 'package:ggplot2':
##
## alpha
## Loaded package itsadug 2.4 (see 'help("itsadug")' ).
data <- read.csv("./data/interim/data_circadian_tidal_moon_sun_5min.csv")
data$ID <- factor(data$ID)
data$datetime <- ymd_hms(data$datetime)
data$night_day <- factor(data$night_day)
data <- data %>%
rename(direction_x = U,
direction_y = V)
data_1eel <- filter(data, ID == "16031")
Create an autocorrelation function plot to explore the cyclic signals in the data. The horizontal blue lines in the plot indicate the confidence interval in the correlogram. First create plot for total dataset.
forecast::Acf(data_1eel$corrected_depth, type = c("correlation"), lag.max=max(dim(data_1eel)), plot = TRUE)
Next, create the ACF plot for the first 500 observations.
forecast::Acf(data_1eel$corrected_depth, type = c("correlation"), lag.max=500, plot = TRUE)
In the plot above, we see 2 cycles subsequently returning. The first is probably related to a tidal signal, as it occurs ca. every 12 hours at lag 144: (12 hours x 60 minutes) / 5 minutes = 144
The second is probably related to a circadian rythm with a 24 h pattern at ca. lag 288: (24 hours x 60 minutes) / 5 minutes = 288
To illustrate this, we added a green vertical line at lag 144 and a blue line at lag 288 to the plot.
vals_list <- forecast::Acf(data_1eel$corrected_depth, type = c("correlation"), lag.max=max(dim(data_1eel)), plot = FALSE)
plot(vals_list[[1]], xlim=c(0, 1000))
abline(v = 144, col = "darkgreen")
abline(v = 288, col = "darkblue")
The depth sensors are not 100% accurate. Hence, it can occur that when an eel is swimming close to the surface, the depth sensor registers a value as if the eel would be above the surface. Also, the depth sensors drift over time. A linear regression between the start of data logging (= when the tag was at zero atmospheric pressure) and the moment the tag surfaced (= when the tag was again at zero atmospheric pressure) was conducted at an earlier stage (to generate the trajectories). Hence the variable name ‘corrected’ depth.
plot(data_1eel$corrected_depth, col = rgb(red = 0.5, green = 0.5, blue = 0.5, alpha = 0.2))
summary(data_1eel$corrected_depth)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -169.184 -32.699 -17.962 -26.144 -4.303 1.078
data_max_depth <- data_1eel %>%
group_by(ID, Date) %>%
summarise(max_depth = min(corrected_depth))
## `summarise()` has grouped output by 'ID'. You can override using the `.groups`
## argument.
data_1eel <- left_join(data_1eel, data_max_depth, by = c("ID","Date"))
# Link trajectory data (source trajectory data script)
tr_data <- read_csv("./data/external/trajectory_data/eel_trajectories.csv")
## Rows: 1962 Columns: 25
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (5): Species, Sp_ID, Date, Max_Depth, Max_Temp
## dbl (18): ID, SD_Depth, Av_Depth, Min_Depth, SD_Temp, Av_Temp, Min_Temp, Tem...
## lgl (2): Max_PoV_WF, Behav_Class
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# Select columns
tr_data <- dplyr::select(tr_data, ID, Date, Max_Depth)
tr_data <- rename(tr_data,
max_depth_bathymetry = Max_Depth)
# Process columns
tr_data$Date <- dmy(tr_data$Date)
tr_data$ID <- factor(tr_data$ID)
# Remove double dates per eel (ID)
#tr_data <- tr_data[!duplicated(tr_data[c('ID','Date')]),]
tr_data <- tr_data %>% # Add ID number to duplicate dates
group_by(ID, Date) %>%
add_tally()
duplicates <- filter(tr_data, n == 2) # Filter duplicate dates
duplicates <- duplicates %>% # Add ID number to distinguish between first and second duplicate
mutate(number_id = row_number())
duplicates <- filter(duplicates, number_id == 2) # Filter second duplicates
tr_data <- filter(tr_data, n != 2) # Remove duplicate dates from tracking dataset
# Bind 'duplicates' dataset with second duplicates to tracking dataset
tr_data <- ungroup(tr_data)
tr_data$n <- NULL
duplicates <- ungroup(duplicates)
duplicates$n <- NULL
duplicates$number_id <- NULL
tr_data <- rbind(tr_data, duplicates)
# Select relevant eels
tr_data <- filter(tr_data, ID == "16031" )
tr_data$ID <- factor(tr_data$ID) # rerun 'factor()' so the number of levels is set accurately
data_1eel$Date <- as.Date(data_1eel$Date)
data_1eel <- left_join(data_1eel, tr_data, by = c('ID', 'Date'))
# Remove line with max depth bathymetry > 300 m --> remove location in deep sea
data_1eel$max_depth_bathymetry <- -1*(as.numeric(data_1eel$max_depth_bathymetry))
data_1eel <- filter(data_1eel, max_depth_bathymetry >= -300)
Plot the daily depth range, the max depth and the max depth bathymetry. The latter 2 do not differ a lot because one of the assumptions in the geolocation model is that the eel’s daily depth is the daily max bathymetry.
par(mfrow=c(3,1))
plot(data_1eel$Date, data_1eel$corrected_depth)
plot(data_1eel$Date, data_1eel$max_depth)
plot(data_1eel$Date, data_1eel$max_depth_bathymetry)
Three depth variables over time for 1 eel: corrected swimming depth, daily maximum depth and the daily maximum bathymetry obtained via the geolocation model.
Eels are known to reside near the bottom. Therefore, it can be reasonably assumed that the daily max swimming depth corresponds with the max daily bathymetry. Hence, the relative depth is calculated as the swimming depth divided by the max depth of that day.
data_1eel$rel_depth <- data_1eel$corrected_depth / data_1eel$max_depth
plot(data_1eel$Date, data_1eel$rel_depth, col = rgb(red = 0.5, green = 0.5, blue = 0.5, alpha = 0.4))
Plot of the relative depth for 1 eel.
ggplot(data_1eel, aes(x = datetime,
y = corrected_depth)) +
geom_line() +
theme_minimal() +
ylab("Depth (m)") +
xlab("Date") +
theme(axis.title.y = element_text(margin = margin(r = 10))) +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
theme(axis.text = element_text(size = 12),
axis.title = element_text(size = 14)) +
scale_x_datetime(date_breaks = "1 day")
Vertical movement pattern for 1 eel.
In this plot a distinctive behaviour can be observed between day and night; the vertical red line indicates midnight. During the daytime, the eels have a steady depth, while at night they have a higher vertical amplitude in the water column. Also, there are moments where the eel remains near the bottom which I think is attributed to the tides.
data_1eel_10days <- filter(data_1eel,datetime >= "2019-02-01 00:00:00", datetime <= "2019-02-11 00:00:00")
# Create line every 24 hours
gnu <- seq.POSIXt(from = lubridate::floor_date(data_1eel_10days$datetime[1], "day"), to= data_1eel_10days$datetime[nrow(data_1eel_10days)], by = 86400)
class(lubridate::floor_date(data_1eel_10days$datetime[1], "day"))
## [1] "POSIXct" "POSIXt"
ggplot(data_1eel_10days, aes(x = datetime,
y = corrected_depth)) +
geom_line() +
theme_minimal() +
ylab("Depth (m)") +
xlab("Date") +
theme(axis.title.y = element_text(margin = margin(r = 10))) +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
theme(axis.text = element_text(size = 12),
axis.title = element_text(size = 14)) +
scale_x_datetime(date_breaks ="1 hour") +
geom_vline(xintercept=gnu, color = "red", size = 1)
10-day detail of the vertical movement pattern for 1 eel.
ggplot(data_1eel, aes(x = datetime,
y = rel_depth)) +
geom_line() +
theme_minimal() +
ylab("Relative depth (m)") +
xlab("Date") +
theme(axis.title.y = element_text(margin = margin(r = 10))) +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
theme(axis.text = element_text(size = 12),
axis.title = element_text(size = 14)) +
scale_x_datetime(date_breaks = "1 day")
The relative depth for 1 eel.
ggplot(data_1eel_10days, aes(x = datetime,
y = rel_depth)) +
geom_line() +
theme_minimal() +
ylab("Relative depth (m)") +
xlab("Date") +
theme(axis.title.y = element_text(margin = margin(r = 10))) +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
theme(axis.text = element_text(size = 12),
axis.title = element_text(size = 14)) +
scale_x_datetime(date_breaks ="1 hour") +
geom_vline(xintercept=gnu, color = "red", size = 1)
10-day detail of the relative depth for 1 eel.
ggplot(data_1eel, aes(x = datetime,
y = temperature)) +
geom_line() +
theme_minimal() +
ylab("Temperature (°C))") +
xlab("Date") +
theme(axis.title.y = element_text(margin = margin(r = 10))) +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
theme(axis.text = element_text(size = 12),
axis.title = element_text(size = 14)) +
scale_x_datetime(date_breaks = "1 day")
Water temperature pattern for 1 eel.
ggplot(data_1eel_10days, aes(x = datetime,
y = temperature)) +
geom_line() +
theme_minimal() +
ylab("Temperature (°C))") +
xlab("Date") +
theme(axis.title.y = element_text(margin = margin(r = 10))) +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
theme(axis.text = element_text(size = 12),
axis.title = element_text(size = 14)) +
scale_x_datetime(date_breaks ="1 hour") +
geom_vline(xintercept=gnu, color = "red", size = 1)
10-day detail of the water temperature pattern for 1 eel.
ggplot(data_1eel, aes(x = temperature,
y = rel_depth)) +
geom_point(alpha = 0.1) +
theme_minimal() +
ylab("Relative depth (m)") +
xlab("Temperature (°C)") +
theme(axis.title.y = element_text(margin = margin(r = 10))) +
theme(axis.text.x = element_text(hjust = 1)) +
theme(axis.text = element_text(size = 12),
axis.title = element_text(size = 14))
Relative depth over temperature for 1 eel.
ggplot(data_1eel_10days, aes(x = temperature,
y = rel_depth)) +
geom_point() +
theme_minimal() +
ylab("Relative depth (m)") +
xlab("Temperature (°C)") +
theme(axis.title.y = element_text(margin = margin(r = 10))) +
theme(axis.text.x = element_text(hjust = 1)) +
theme(axis.text = element_text(size = 12),
axis.title = element_text(size = 14))
10-day detail of the relative depth over temperature pattern for 1 eel.
ggplot(data_1eel, aes(x = datetime,
y = max_depth)) +
geom_point() +
theme_minimal() +
ylab("Depth (m)") +
xlab("Date") +
theme(axis.title.y = element_text(margin = margin(r = 10))) +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
theme(axis.text = element_text(size = 12),
axis.title = element_text(size = 14)) +
scale_x_datetime(date_breaks = "1 day")
Daily max depth for 1 eel.
ggplot(data_1eel, aes(x = max_depth,
y = corrected_depth)) +
geom_point() +
theme_minimal() +
ylab("Depth (m)") +
xlab("Date") +
theme(axis.title.y = element_text(margin = margin(r = 10))) +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
theme(axis.text = element_text(size = 12),
axis.title = element_text(size = 14))
Vertical movement pattern over the daily max depth for 1 eel.
ggplot(data_1eel, aes(x = direction_x,
y = rel_depth)) +
geom_point(alpha = 0.1) +
theme_minimal() +
ylab("Relative depth (m)") +
xlab("Eastward current velocity (m/s)") +
theme(axis.title.y = element_text(margin = margin(r = 10))) +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
theme(axis.text = element_text(size = 12),
axis.title = element_text(size = 14))
## Warning: Removed 4332 rows containing missing values (geom_point).
Relative depth over eastward current velocity for 1 eel.
ggplot(data_1eel_10days, aes(x = direction_x,
y = rel_depth)) +
geom_point() +
theme_minimal() +
ylab("Relative depth (m)") +
xlab("Eastward current velocity (m/s)") +
theme(axis.title.y = element_text(margin = margin(r = 10))) +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
theme(axis.text = element_text(size = 12),
axis.title = element_text(size = 14))
## Warning: Removed 960 rows containing missing values (geom_point).
10-day detail of the relative depth over eastward current velocity for 1 eel.
ggplot(data_1eel, aes(x = direction_y,
y = rel_depth)) +
geom_point(alpha = 0.1) +
theme_minimal() +
ylab("Relative depth (m)") +
xlab("Northward current velocity (m/s)") +
theme(axis.title.y = element_text(margin = margin(r = 10))) +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
theme(axis.text = element_text(size = 12),
axis.title = element_text(size = 14))
## Warning: Removed 4332 rows containing missing values (geom_point).
Relative depth over northward current velocity for 1 eel.
ggplot(data_1eel_10days, aes(x = direction_y,
y = rel_depth)) +
geom_point() +
theme_minimal() +
ylab("Relative depth (m)") +
xlab("Northward current velocity (m/s)") +
theme(axis.title.y = element_text(margin = margin(r = 10))) +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
theme(axis.text = element_text(size = 12),
axis.title = element_text(size = 14))
## Warning: Removed 960 rows containing missing values (geom_point).
10-day detail of the relative depth over northward current velocity for 1 eel.
Note that there is not a big difference because the depth range between day and night is equal, but the pattern is not.
ggplot(data_1eel, aes(x = night_day,
y = rel_depth)) +
geom_point(alpha = 0.1) +
theme_minimal() +
ylab("Relative depth (m)") +
xlab("Circadian phases") +
theme(axis.title.y = element_text(margin = margin(r = 10))) +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
theme(axis.text = element_text(size = 12),
axis.title = element_text(size = 14))
Relative depth during daytime and at night for 1 eel.
ggplot(data_1eel, aes(x = 100*moon_fraction,
y = rel_depth)) +
geom_point(alpha = 0.2) +
theme_minimal() +
ylab("Relative depth (m)") +
xlab("Illuminated moon phase (%)") +
theme(axis.title.y = element_text(margin = margin(r = 10))) +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
theme(axis.text = element_text(size = 12),
axis.title = element_text(size = 14))
Relative depth over illuminated fraction of the moon for 1 eel.
ggplot(data_1eel_10days, aes(x = 100*moon_fraction,
y = rel_depth)) +
geom_point() +
theme_minimal() +
ylab("Relative depth (m)") +
xlab("Illuminated moon phase (%)") +
theme(axis.title.y = element_text(margin = margin(r = 10))) +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
theme(axis.text = element_text(size = 12),
axis.title = element_text(size = 14))
10-day detail of the relative depth over illuminated fraction of the moon for 1 eel.
ggplot(data_1eel, aes(x = sun_altitude,
y = rel_depth)) +
geom_point(alpha = 0.2) +
theme_minimal() +
ylab("Relative depth (m)") +
xlab("Sun altitude (radians)") +
theme(axis.title.y = element_text(margin = margin(r = 10))) +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
theme(axis.text = element_text(size = 12),
axis.title = element_text(size = 14))
The relative depth over the sun altitude for 1 eel.
ggplot(data_1eel_10days, aes(x = sun_altitude,
y = rel_depth)) +
geom_point() +
theme_minimal() +
ylab("Relative depth (m)") +
xlab("Sun altitude (radians)") +
theme(axis.title.y = element_text(margin = margin(r = 10))) +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
theme(axis.text = element_text(size = 12),
axis.title = element_text(size = 14))
10-day detail of the relative depth over the sun altitude for 1 eel.
#plot(data_1eel$sun_altitude)
data_1eel$sun_class <- NA
for (i in 2:dim(data_1eel)[1]){
if (data_1eel$sun_altitude[i] > data_1eel$sun_altitude[i-1]){
data_1eel$sun_class[i] = "rising"
} else{
data_1eel$sun_class[i] = "setting"
}}
data_1eel$sun_class <- factor(data_1eel$sun_class)
Note that there is not a big difference between sunrise and sunset because the depth range between day and night is equal, but the pattern is not. The NA is due to the first record of the dataset, for which a sun class could not be attributed according to the rule in the if-else loop.
ggplot(data_1eel, aes(x = sun_class,
y = rel_depth)) +
geom_point(alpha = 0.2) +
theme_minimal() +
ylab("Relative depth (m)") +
xlab("Sun phase") +
theme(axis.title.y = element_text(margin = margin(r = 10))) +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
theme(axis.text = element_text(size = 12),
axis.title = element_text(size = 14))
Relative depth over the two different sun classes (sunrise and sunset) for 1 eel.
ggplot(data_1eel, aes(x = sun_azimuth,
y = rel_depth)) +
geom_point(alpha = 0.2) +
theme_minimal() +
ylab("Relative depth (m)") +
xlab("Sun azimuth (radians)") +
theme(axis.title.y = element_text(margin = margin(r = 10))) +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
theme(axis.text = element_text(size = 12),
axis.title = element_text(size = 14))
Relative depth over the sun azimuth for 1 eel.
ggplot(data_1eel_10days, aes(x = sun_azimuth,
y = rel_depth)) +
geom_point() +
theme_minimal() +
ylab("Relative depth (m)") +
xlab("Sun azimuth (radians)") +
theme(axis.title.y = element_text(margin = margin(r = 10))) +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
theme(axis.text = element_text(size = 12),
axis.title = element_text(size = 14))
10-day detail of the relative depth over the sun azimuth for 1 eel.
For model construction, the exploratory variables need to be scaled. Max value should be between 0.5 - 5.
# direction_x
summary(data_1eel$direction_x)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## -1.756 -0.284 0.002 0.011 0.312 1.579 4332
#iqr <- 0.312 - (-0.284 ) #3rd QU - 1st Qu
#data_1eel$direction_x_centre <- (data_1eel$direction_x - median(data_1eel$direction_x))/iqr
# direction_y
summary(data_1eel$direction_y)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## -0.771 -0.172 0.007 0.015 0.199 1.017 4332
#iqr <- 0.199 - (-0.172) #3rd QU - 1st Qu
#data_1eel$direction_y_centre <- (data_1eel$direction_y - median(data_1eel$direction_y))/iqr
# Temperature
summary(data_1eel$temperature)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 6.344 9.281 9.938 9.809 10.688 12.375
#iqr <- 10.688 - (9.266 ) #3rd QU - 1st Qu
#data_1eel$temperature_centre <- (data_1eel$temperature - median(data_1eel$temperature))/iqr
data_1eel$temperature_scaled <- data_1eel$temperature / 10
# Moon illumination
summary(data_1eel$moon_fraction)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0001508 0.1299341 0.4100736 0.4585004 0.7963852 0.9999797
#iqr <- 0.7966425 - (0.1311380) #3rd QU - 1st Qu
#data_1eel$moon_fraction_centre <- (data_1eel$moon_fraction - median(data_1eel$moon_fraction))/iqr
# Sun altitude
summary(data_1eel$sun_altitude)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -1.1107 -0.7703 -0.2729 -0.3103 0.1559 0.5006
#iqr <- 0.1584 - (-0.7657) #3rd QU - 1st Qu
#data_1eel$sun_altitude_centre <- (data_1eel$sun_altitude - median(data_1eel$sun_altitude))/iqr
# Sun azimuth
summary(data_1eel$sun_azimuth)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -3.1400445 -1.3306698 -0.0000697 0.0006432 1.3316443 3.1411577
#iqr <- 1.322311 - (-1.326227) #3rd QU - 1st Qu
#data_1eel$sun_azimuth_centre <- (data_1eel$sun_azimuth - median(subset$sun_azimuth))/iqr
In case it would be needed, the response variable is detrended by subtracting a value by the previous value.
data_1eel <- data_1eel %>%
group_by(ID) %>%
arrange(datetime) %>%
mutate(diff_rel_depth = rel_depth - lag(rel_depth))
par(mfrow=c(2,1))
plot(data_1eel$datetime, data_1eel$rel_depth)
plot(data_1eel$datetime, data_1eel$diff_rel_depth)
Check ACF & PACF plot if data is stationary
forecast::Acf(data_1eel$diff_rel_depth, type = c("correlation"), lag.max=20, plot = TRUE)
forecast::Pacf(data_1eel$diff_rel_depth, lag.max=20, plot = TRUE)
Next, a model was applied to a subset of 5 eels with a similar migration path (through the Channel). I want to apply this complex model first on only a few fish before scaling up the complexity.
library(lme4)
## Loading required package: Matrix
##
## Attaching package: 'Matrix'
## The following objects are masked from 'package:pracma':
##
## expm, lu, tril, triu
## The following objects are masked from 'package:tidyr':
##
## expand, pack, unpack
##
## Attaching package: 'lme4'
## The following object is masked from 'package:nlme':
##
## lmList
library(glmmTMB)
data_5eels <- filter(data, ID == "16031" |
ID == "17535" |
ID == "17536" |
ID == "15777" |
ID == "17510")
data_5eels$ID <- factor(data_5eels$ID)
unique(data_5eels$ID)
## [1] 16031 15777 17510 17536 17535
## Levels: 15777 16031 17510 17535 17536
For this study I am interested in the vertical movement behaviour on the contintental shelf. The vertical behaviour in oceanic conditions has already been described.
data_5eels <- data_5eels[!(data_5eels$ID == "17535" & data_5eels$datetime > '2020-01-11 00:00:00'),]
plot(data_5eels$corrected_depth)
max_depth <- data_5eels %>%
group_by(ID, Date) %>%
summarise(max_depth = min(corrected_depth))
## `summarise()` has grouped output by 'ID'. You can override using the `.groups`
## argument.
data_5eels <- left_join(data_5eels, max_depth, by = c("ID","Date"))
Note that I will not use this, but leave the code as is just in case.
# Link trajectory data (source trajectory data script)
tr_data <- read_csv("./data/external/trajectory_data/eel_trajectories.csv")
## Rows: 1962 Columns: 25
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (5): Species, Sp_ID, Date, Max_Depth, Max_Temp
## dbl (18): ID, SD_Depth, Av_Depth, Min_Depth, SD_Temp, Av_Temp, Min_Temp, Tem...
## lgl (2): Max_PoV_WF, Behav_Class
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# Select columns
tr_data <- dplyr::select(tr_data, ID, Date, Max_Depth)
tr_data <- rename(tr_data,
max_depth_bathymetry = Max_Depth)
# Process columns
tr_data$Date <- dmy(tr_data$Date)
tr_data$ID <- factor(tr_data$ID)
# Remove double dates per eel (ID)
#tr_data <- tr_data[!duplicated(tr_data[c('ID','Date')]),]
tr_data <- tr_data %>% # Add ID number to duplicate dates
group_by(ID, Date) %>%
add_tally()
duplicates <- filter(tr_data, n == 2) # Filter duplicate dates
duplicates <- duplicates %>% # Add ID number to distinguish between first and second duplicate
mutate(number_id = row_number())
duplicates <- filter(duplicates, number_id == 2) # Filter second duplicates
tr_data <- filter(tr_data, n != 2) # Remove duplicate dates from tracking dataset
# Bind 'duplicates' dataset with second duplicates to tracking dataset
tr_data <- ungroup(tr_data)
tr_data$n <- NULL
duplicates <- ungroup(duplicates)
duplicates$n <- NULL
duplicates$number_id <- NULL
tr_data <- rbind(tr_data, duplicates)
# Select relevant eels
tr_data <- filter(tr_data, ID == "16031" |
ID == "17535" |
ID == "17536" |
ID == "15777" |
ID == "17510")
tr_data$ID <- factor(tr_data$ID) # rerun 'factor()' so the number of levels is set accurately
data_5eels$Date <- as.Date(data_5eels$Date)
data_5eels <- left_join(data_5eels, tr_data, by = c('ID', 'Date'))
# Remove line with max depth bathymetry > 300 m --> remove location in deep sea
#data_5eels$max_depth_bathymetry <- as.numeric(data_5eels$max_depth_bathymetry)
#data_5eels <- filter(data_5eels, max_depth_bathymetry >= -300)
Plot the daily depth range, the max depth and the max depth bathymetry. The latter 2 do not differ a lot because one of the assumptions in the geolocation model is that the eel’s daily depth is the daily max bathymetry.
par(mfrow=c(3,1))
plot(data_5eels$Date, data_5eels$corrected_depth)
plot(data_5eels$Date, data_5eels$max_depth)
plot(data_5eels$Date, data_5eels$max_depth_bathymetry)
data_5eels$rel_depth <- data_5eels$corrected_depth / data_5eels$max_depth
plot(data_5eels$Date, data_5eels$rel_depth)
# Remove rows with NA values
data_5eels <- na.omit(data_5eels)
# Classify sun_altitude as rising or setting sun
data_5eels$sun_class <- NA
for (i in 2:dim(data_5eels)[1]){
if (data_5eels$sun_altitude[i] > data_5eels$sun_altitude[i-1]){
data_5eels$sun_class[i] = "rising"
} else{
data_5eels$sun_class[i] = "setting"
}}
# Set as factor
data_5eels$sun_class <- factor(data_5eels$sun_class)
# Remove first line of each eel, since that "sun_class" value is incorrect. The for-loop does not take into account different eels, but runs over the complete dataset.
data_5eels <- data_5eels %>%
group_by(ID) %>%
slice(-1)
For model construction, the exploratory variables need to be scaled. Max value should be between 0.5 - 5.
# direction_x
summary(data_5eels$direction_x)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -1.755500 -0.264700 -0.003000 0.007684 0.280800 1.614200
# direction_y
summary(data_5eels$direction_y)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -1.349200 -0.183000 -0.001000 0.009851 0.203600 1.720400
# Temperature
summary(data_5eels$temperature)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 6.391 9.547 10.703 10.548 11.688 13.203
data_5eels$temperature_scaled <- data_5eels$temperature / 10
# Moon illumination
summary(data_5eels$moon_fraction)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000752 0.1598697 0.4552128 0.4806714 0.8092580 0.9999797
# Sun altitude
summary(data_5eels$sun_altitude)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -1.1415 -0.7938 -0.2910 -0.3353 0.1378 0.5006
# Sun azimuth
summary(data_5eels$sun_azimuth)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -3.1415520 -1.3090434 0.0000505 0.0138621 1.3106045 3.1411577
data_5eels <- data_5eels %>%
arrange(ID, datetime) %>%
group_by(ID) %>%
mutate(times = factor(row_number()))
When running the glmer() function, it gives an error. Not sure why, but based on fora searches it might have to do with the data complexity and/or size.
#model_glmer <- glmer(rel_depth ~ direction_x_centre +
# direction_y_centre +
# moon_fraction_centre +
# sun_altitude_centre +
# sun_azimuth_centre +
# (1|ID),
# data = data_5eels,
# family = beta_family(),
# na.action = na.omit)
#summary(model_glmer)
Returns the following error:
# Error in (function (fr, X, reTrms, family, nAGQ = 1L, verbose = 0L, maxit = 100L, :
#(maxstephalfit) PIRLS step-halvings failed to reduce deviance in pwrssUpdate
The function glmmTMB() runs when no AR1 construct is implemented. For more info on the method implementation, see this website. The glmmTMB R-package was used.
Update: I included the positive corrected_depth values, hence, the rel_depth now contains negative values so a model with beta-distribution is unable to run.
summary(data_5eels$rel_depth)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -0.34012 0.09211 0.52734 0.48313 0.83675 1.00000
#model_tmb <- glmmTMB(rel_depth ~ direction_x +
# direction_y +
# moon_fraction +
# sun_altitude +
# sun_azimuth,
# data = data_5eels,
# family = beta_family(),
# na.action = na.omit)
#summary(model_tmb)
When the AR1 term is added, a memory error occurs. This is probably due to the fact that this function is rather viable for datasets up to 100k (see documentation)
#model_tmb_ar1 <- glmmTMB(rel_depth ~ direction_x_centre +
# direction_y_centre +
# moon_fraction_centre +
# sun_altitude_centre +
# sun_azimuth_centre +
# ar1(times + 0 | ID),
# family = beta_family(),
# data = subset)
Returns the following error:
#Error: cannot allocate vector of size 4.8 Gb
The function bam from the mgcv package is designed for large datasets (see documentation).
To implement the AR1 strucgture in the model, select the first record per eel and flag as true.
# Select first record per eel and flag as 'TRUE'
t.first <- data_5eels[match(unique(data_5eels$ID), data_5eels$ID),]
t.first <- select(t.first, ID, datetime)
t.first$start.event <- TRUE
# Merge with dataset
data_5eels <- left_join(data_5eels, t.first, by = c("ID", "datetime"))
# Set NA as FALSE
data_5eels[c("start.event")][is.na(data_5eels[c("start.event")])] <- FALSE
# Run gamm model with autocorrelation argument
# Full model
bam_model <- bam(rel_depth ~ s(direction_x) +
s(direction_y) +
s(moon_fraction) +
s(sun_altitude) +
s(sun_azimuth) +
sun_class +
s(ID, bs="re"),
family = gaussian(), data = data_5eels, discrete = TRUE,
#AR.start=subset$start.event,
rho=0.90,
na.action = na.omit)
Calculate rho (= acf value) which can be fed into the bam_model for model optimisation. However, this results in an error for a model of the beta family.
# Calculate Rho, the autocorrelation value of a specific lag
#valRho <- acf(resid(bam_model), plot=TRUE)$acf[2]
#Error in plot.window(...) : need finite 'ylim' values
Show summary of the model
summary(bam_model)
##
## Family: gaussian
## Link function: identity
##
## Formula:
## rel_depth ~ s(direction_x) + s(direction_y) + s(moon_fraction) +
## s(sun_altitude) + s(sun_azimuth) + sun_class + s(ID, bs = "re")
##
## Parametric coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.477439 0.037303 12.80 <2e-16 ***
## sun_classsetting 0.002718 0.007542 0.36 0.719
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## s(direction_x) 4.361 5.591 1.809 0.098 .
## s(direction_y) 1.001 1.001 0.651 0.420
## s(moon_fraction) 4.236 5.194 1.659 0.137
## s(sun_altitude) 1.001 1.001 0.325 0.569
## s(sun_azimuth) 4.670 5.855 5.226 3.15e-05 ***
## s(ID) 3.814 5.000 16.758 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.0499 Deviance explained = 5.04%
## fREML = -18593 Scale est. = 0.13465 n = 45020
Check the model
par(mfrow = c(2, 2))
gam.check(bam_model)
##
## Method: fREML Optimizer: perf chol
## $grad
## [1] 1.562343e-06 -3.118314e-04 1.348723e-06 -2.585771e-04 5.227809e-05
## [6] 3.020207e-07 1.004495e-04
##
## $hess
## [,1] [,2] [,3] [,4] [,5]
## 8.215911e-01 -9.877175e-06 1.445064e-02 -1.563327e-06 4.341655e-03
## -9.877175e-06 3.117270e-04 5.399458e-06 -2.759905e-10 2.845264e-06
## 1.445064e-02 5.399458e-06 5.593165e-01 -1.391139e-06 6.930835e-03
## -1.563327e-06 -2.759905e-10 -1.391139e-06 2.584836e-04 -5.221674e-05
## 4.341655e-03 2.845264e-06 6.930835e-03 -5.221674e-05 1.165441e+00
## 7.553825e-03 -2.268884e-07 1.127069e-02 -3.021913e-07 -5.409676e-03
## d -1.680286e+00 -6.103953e-05 -1.618025e+00 -1.002699e-04 -1.834949e+00
## [,6] [,7]
## 7.553825e-03 -1.680286e+00
## -2.268884e-07 -6.103953e-05
## 1.127069e-02 -1.618025e+00
## -3.021913e-07 -1.002699e-04
## -5.409676e-03 -1.834949e+00
## 1.818722e+00 -1.906891e+00
## d -1.906891e+00 2.250650e+04
##
## Model rank = 52 / 52
##
## Basis dimension (k) checking results. Low p-value (k-index<1) may
## indicate that k is too low, especially if edf is close to k'.
##
## k' edf k-index p-value
## s(direction_x) 9.00 4.36 0.67 <2e-16 ***
## s(direction_y) 9.00 1.00 0.69 <2e-16 ***
## s(moon_fraction) 9.00 4.24 0.94 <2e-16 ***
## s(sun_altitude) 9.00 1.00 1.01 0.68
## s(sun_azimuth) 9.00 4.67 1.02 0.95
## s(ID) 5.00 3.81 NA NA
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
This document describes the GAM procedure following Zuur et al. (2012) ‘Beginner’s Guide to Generalized Additive Models with R’. The analysis is applied on a single eel (ID A16031) and a few days for simplicity. Spoiler alert: it shows that the model runs very good on a few days but is off when the amount of data (i.e. days of data) is increased.
subset_1eel <- filter(data_5eels, ID == "16031")
Plot the depth data. Note that the depth sensor data drifts. A linear regression was applied to correct for this drift, hence the variable name ‘corrected_depth’. Also, the relative depth might be a better response variable and was calculated as the corrected_depth divided by the max depth for that day. It is assumed that the max depth resembles the sea bottom.
ggplot(subset_1eel, aes(x = datetime, y = corrected_depth)) +
geom_line() +
theme_minimal() +
ylab("Depth (m)") +
xlab("Date") +
theme(axis.title.y = element_text(margin = margin(r = 10))) +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
theme(axis.text = element_text(size = 12),
axis.title = element_text(size = 14)) +
scale_x_datetime(date_breaks = "5 days")
ggplot(subset_1eel, aes(x = datetime, y = rel_depth)) +
geom_line() +
theme_minimal() +
ylab("Depth (m)") +
xlab("Date") +
theme(axis.title.y = element_text(margin = margin(r = 10))) +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
theme(axis.text = element_text(size = 12),
axis.title = element_text(size = 14)) +
scale_x_datetime(date_breaks = "5 days")
subset_1eel_2days_1 <- filter(subset_1eel, datetime > '2018-12-12 00:00:00',
datetime < '2018-12-14 00:00:00')
ggplot(subset_1eel_2days_1, aes(x = datetime, y = corrected_depth)) +
geom_line() +
theme_minimal() +
ylab("Depth (m)") +
xlab("Date") +
theme(axis.title.y = element_text(margin = margin(r = 10))) +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
theme(axis.text = element_text(size = 12),
axis.title = element_text(size = 14)) +
scale_x_datetime(date_breaks = "1 day")
ggplot(subset_1eel_2days_1, aes(x = datetime, y = rel_depth)) +
geom_line() +
theme_minimal() +
ylab("Depth (m)") +
xlab("Date") +
theme(axis.title.y = element_text(margin = margin(r = 10))) +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
theme(axis.text = element_text(size = 12),
axis.title = element_text(size = 14)) +
scale_x_datetime(date_breaks = "1 day")
Apply the GAM. One with a Gaussian distribution and one with a binomial distribution (since the data is between 0 and 1). The explanatory variables I am interested in are the current variables (obtained via CEFAS model) and the factor with circadian phases ‘day’ or ‘night’. The current variables are split in two vectors giving the current speed in direction ‘x’ and ‘y’. Before the variables are fed into the GAM, a correlation analysis is applied between direction_x and direction_y. When a strong correlation (> |0.6|) is observed, variable ‘direction_x’ is chosen.
Despite the values of the response variable are between 0 and 1, indicating a binomial distribution, the GAM with a Gaussian distribution results in the best fit and lowest AIC. Further, note that the edf in the model output illustrate non-linear relationships.
cor(subset_1eel_2days_1$direction_x, subset_1eel_2days_1$direction_y)
## [1] 0.9327713
gam1 <- gam(rel_depth ~ s(direction_x, by = night_day) + night_day, data = subset_1eel_2days_1)
gam2 <- gam(rel_depth ~ s(direction_x, by = night_day) + night_day, data = subset_1eel_2days_1, family = binomial(link = "logit"))
## Warning in eval(family$initialize): non-integer #successes in a binomial glm!
## Warning in eval(family$initialize): non-integer #successes in a binomial glm!
## Warning in eval(family$initialize): non-integer #successes in a binomial glm!
## Warning in eval(family$initialize): non-integer #successes in a binomial glm!
## Warning in eval(family$initialize): non-integer #successes in a binomial glm!
## Warning in eval(family$initialize): non-integer #successes in a binomial glm!
## Warning in eval(family$initialize): non-integer #successes in a binomial glm!
## Warning in eval(family$initialize): non-integer #successes in a binomial glm!
## Warning in eval(family$initialize): non-integer #successes in a binomial glm!
## Warning in eval(family$initialize): non-integer #successes in a binomial glm!
## Warning in eval(family$initialize): non-integer #successes in a binomial glm!
AIC(gam1, gam2)
## df AIC
## gam1 13.294183 41.90012
## gam2 7.653967 492.52814
summary(gam1)
##
## Family: gaussian
## Link function: identity
##
## Formula:
## rel_depth ~ s(direction_x, by = night_day) + night_day
##
## Parametric coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.70278 0.02116 33.21 <2e-16 ***
## night_daynight -0.39104 0.02499 -15.65 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## s(direction_x):night_dayday 2.030 2.541 6.178 0.00111 **
## s(direction_x):night_daynight 8.264 8.848 34.471 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.558 Deviance explained = 56.7%
## GCV = 0.062786 Scale est. = 0.061444 n = 575
In a next step, the smoothers are plotted for the best model (gam1)
par(mfrow = c(2,1))
plot(gam1, select = 1, main = "Day")
plot(gam1, select = 2, main = "Night")
Check model performance.
par(mfrow = c(2, 2))
gam.check(gam1)
##
## Method: GCV Optimizer: magic
## Smoothing parameter selection converged after 11 iterations.
## The RMS GCV score gradient at convergence was 5.677614e-07 .
## The Hessian was positive definite.
## Model rank = 20 / 20
##
## Basis dimension (k) checking results. Low p-value (k-index<1) may
## indicate that k is too low, especially if edf is close to k'.
##
## k' edf k-index p-value
## s(direction_x):night_dayday 9.00 2.03 0.29 <2e-16 ***
## s(direction_x):night_daynight 9.00 8.26 0.29 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
subset_1eel_2days_1$residuals <- residuals(gam1, type="response")
subset_1eel_2days_1$predicted <- predict.bam(gam1)
subset_1eel_2days_1$fitted <- fitted(gam1, type="response")
plot <- ggplot(subset_1eel_2days_1, aes(x=numericdate, y=rel_depth)) +
geom_line(alpha = 0.2) +
geom_line(subset_1eel_2days_1, mapping = aes(x=numericdate, y=fitted), colour = "red")
plot
Plot of the residuals vs covariable ‘direction_x’
plot <- ggplot(subset_1eel_2days_1, aes(x=direction_x, y=residuals)) +
geom_point(alpha = 0.6)
plot
E1 <- resid(gam1, type = "pearson")
F1 <- fitted(gam1, type = "response")
par(mfrow = c(2,2))
plot(x = F1, y = E1, xlab = "Fitted values", ylab = "Pearson residuals")
plot(x = subset_1eel_2days_1$direction_x, y = E1, xlab = "Direction x", ylab = "Pearson residuals")
abline(h = 0, lty = 2)
plot(E1 ~ night_day, data = subset_1eel_2days_1, ylab = "Pearson residuals", xlab = "Day Night")
hist(E1, xlab = "Pearson residuals", main = "")
Check overdispersion
E1 <- resid(gam1, type = "pearson")
N <- nrow(subset_1eel_2days_1)
p <- length(coef(gam1))
overdispersion <- sum(E1^2)/(N-p)
overdispersion
## [1] 0.06229675
subset_1eel_2days_2 <- filter(subset_1eel, datetime > '2019-02-04 00:00:00',
datetime < '2019-02-06 00:00:00')
ggplot(subset_1eel_2days_2, aes(x = datetime, y = corrected_depth)) +
geom_line() +
theme_minimal() +
ylab("Depth (m)") +
xlab("Date") +
theme(axis.title.y = element_text(margin = margin(r = 10))) +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
theme(axis.text = element_text(size = 12),
axis.title = element_text(size = 14)) +
scale_x_datetime(date_breaks = "1 day")
ggplot(subset_1eel_2days_2, aes(x = datetime, y = rel_depth)) +
geom_line() +
theme_minimal() +
ylab("Depth (m)") +
xlab("Date") +
theme(axis.title.y = element_text(margin = margin(r = 10))) +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
theme(axis.text = element_text(size = 12),
axis.title = element_text(size = 14)) +
scale_x_datetime(date_breaks = "1 day")
Apply GAM
cor(subset_1eel_2days_2$direction_x, subset_1eel_2days_2$direction_y)
## [1] 0.8510163
gam1 <- gam(rel_depth ~ s(direction_x, by = night_day) + night_day, data = subset_1eel_2days_2)
gam2 <- gam(rel_depth ~ s(direction_x, by = night_day) + night_day, data = subset_1eel_2days_2, family = binomial(link = "logit"))
## Warning in eval(family$initialize): non-integer #successes in a binomial glm!
## Warning in eval(family$initialize): non-integer #successes in a binomial glm!
## Warning in eval(family$initialize): non-integer #successes in a binomial glm!
## Warning in eval(family$initialize): non-integer #successes in a binomial glm!
## Warning in eval(family$initialize): non-integer #successes in a binomial glm!
## Warning in eval(family$initialize): non-integer #successes in a binomial glm!
## Warning in eval(family$initialize): non-integer #successes in a binomial glm!
## Warning in eval(family$initialize): non-integer #successes in a binomial glm!
## Warning in eval(family$initialize): non-integer #successes in a binomial glm!
## Warning in eval(family$initialize): non-integer #successes in a binomial glm!
## Warning in eval(family$initialize): non-integer #successes in a binomial glm!
## Warning in eval(family$initialize): non-integer #successes in a binomial glm!
## Warning in eval(family$initialize): non-integer #successes in a binomial glm!
## Warning in eval(family$initialize): non-integer #successes in a binomial glm!
## Warning in eval(family$initialize): non-integer #successes in a binomial glm!
## Warning in eval(family$initialize): non-integer #successes in a binomial glm!
## Warning in eval(family$initialize): non-integer #successes in a binomial glm!
## Warning in eval(family$initialize): non-integer #successes in a binomial glm!
## Warning in eval(family$initialize): non-integer #successes in a binomial glm!
## Warning in eval(family$initialize): non-integer #successes in a binomial glm!
## Warning in eval(family$initialize): non-integer #successes in a binomial glm!
## Warning in eval(family$initialize): non-integer #successes in a binomial glm!
AIC(gam1, gam2)
## df AIC
## gam1 18.64829 -135.9035
## gam2 16.09634 344.0297
summary(gam1)
##
## Family: gaussian
## Link function: identity
##
## Formula:
## rel_depth ~ s(direction_x, by = night_day) + night_day
##
## Parametric coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.49473 0.01710 28.939 < 2e-16 ***
## night_daynight 0.08625 0.02208 3.907 0.000112 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## s(direction_x):night_dayday 6.798 7.703 25.53 <2e-16 ***
## s(direction_x):night_daynight 8.850 8.993 41.70 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.612 Deviance explained = 62.8%
## GCV = 0.040936 Scale est. = 0.03905 n = 383
In a next step, the smoothers are plotted for the best model (gam1)
par(mfrow = c(2,1))
plot(gam1, select = 1, main = "Day")
plot(gam1, select = 2, main = "Night")
Check model performance.
par(mfrow = c(2, 2))
gam.check(gam1)
##
## Method: GCV Optimizer: magic
## Smoothing parameter selection converged after 11 iterations.
## The RMS GCV score gradient at convergence was 7.60565e-07 .
## The Hessian was positive definite.
## Model rank = 20 / 20
##
## Basis dimension (k) checking results. Low p-value (k-index<1) may
## indicate that k is too low, especially if edf is close to k'.
##
## k' edf k-index p-value
## s(direction_x):night_dayday 9.00 6.80 0.33 <2e-16 ***
## s(direction_x):night_daynight 9.00 8.85 0.33 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
subset_1eel_2days_2$residuals <- residuals(gam1, type="response")
subset_1eel_2days_2$predicted <- predict.bam(gam1)
subset_1eel_2days_2$fitted <- fitted(gam1, type="response")
plot <- ggplot(subset_1eel_2days_2, aes(x=numericdate, y=rel_depth)) +
geom_line(alpha = 0.2) +
geom_line(subset_1eel_2days_2, mapping = aes(x=numericdate, y=fitted), colour = "red")
plot
Plot of the residuals vs covariable ‘direction_x’
plot <- ggplot(subset_1eel_2days_2, aes(x=direction_x, y=residuals)) +
geom_point(alpha = 0.6)
plot
E1 <- resid(gam1, type = "pearson")
F1 <- fitted(gam1, type = "response")
par(mfrow = c(2,2))
plot(x = F1, y = E1, xlab = "Fitted values", ylab = "Pearson residuals")
plot(x = subset_1eel_2days_2$direction_x, y = E1, xlab = "Direction x", ylab = "Pearson residuals")
abline(h = 0, lty = 2)
plot(E1 ~ night_day, data = subset_1eel_2days_2, ylab = "Pearson residuals", xlab = "Day Night")
hist(E1, xlab = "Pearson residuals", main = "")
Check overdispersion
E1 <- resid(gam1, type = "pearson")
N <- nrow(subset_1eel_2days_2)
p <- length(coef(gam1))
overdispersion <- sum(E1^2)/(N-p)
overdispersion
## [1] 0.03930261
subset_1eel_6days <- filter(subset_1eel, datetime > '2019-02-04 00:00:00',
datetime < '2019-02-10 00:00:00')
ggplot(subset_1eel_6days, aes(x = datetime, y = corrected_depth)) +
geom_line() +
theme_minimal() +
ylab("Depth (m)") +
xlab("Date") +
theme(axis.title.y = element_text(margin = margin(r = 10))) +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
theme(axis.text = element_text(size = 12),
axis.title = element_text(size = 14)) +
scale_x_datetime(date_breaks = "1 day")
ggplot(subset_1eel_6days, aes(x = datetime, y = rel_depth)) +
geom_line() +
theme_minimal() +
ylab("Depth (m)") +
xlab("Date") +
theme(axis.title.y = element_text(margin = margin(r = 10))) +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
theme(axis.text = element_text(size = 12),
axis.title = element_text(size = 14)) +
scale_x_datetime(date_breaks = "1 day")
Apply GAM
Note that the R² adj reduced by more than half (33%) by addding 4 more days of data.
cor(subset_1eel_6days$direction_x, subset_1eel_6days$direction_y)
## [1] 0.6764244
gam1 <- gam(rel_depth ~ s(direction_x, by = night_day) + night_day, data = subset_1eel_6days)
gam2 <- gam(rel_depth ~ s(direction_x, by = night_day) + night_day, data = subset_1eel_6days, family = binomial(link = "logit"))
## Warning in eval(family$initialize): non-integer #successes in a binomial glm!
## Warning in eval(family$initialize): non-integer #successes in a binomial glm!
## Warning in eval(family$initialize): non-integer #successes in a binomial glm!
## Warning in eval(family$initialize): non-integer #successes in a binomial glm!
## Warning in eval(family$initialize): non-integer #successes in a binomial glm!
## Warning in eval(family$initialize): non-integer #successes in a binomial glm!
## Warning in eval(family$initialize): non-integer #successes in a binomial glm!
## Warning in eval(family$initialize): non-integer #successes in a binomial glm!
## Warning in eval(family$initialize): non-integer #successes in a binomial glm!
## Warning in eval(family$initialize): non-integer #successes in a binomial glm!
## Warning in eval(family$initialize): non-integer #successes in a binomial glm!
## Warning in eval(family$initialize): non-integer #successes in a binomial glm!
## Warning in eval(family$initialize): non-integer #successes in a binomial glm!
## Warning in eval(family$initialize): non-integer #successes in a binomial glm!
## Warning in eval(family$initialize): non-integer #successes in a binomial glm!
## Warning in eval(family$initialize): non-integer #successes in a binomial glm!
AIC(gam1, gam2)
## df AIC
## gam1 17.37228 108.7839
## gam2 11.11351 1315.2891
summary(gam1)
##
## Family: gaussian
## Link function: identity
##
## Formula:
## rel_depth ~ s(direction_x, by = night_day) + night_day
##
## Parametric coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.51872 0.01202 43.142 < 2e-16 ***
## night_daynight 0.09847 0.01560 6.311 3.96e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## s(direction_x):night_dayday 7.922 8.677 19.26 <2e-16 ***
## s(direction_x):night_daynight 6.450 7.535 43.38 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.318 Deviance explained = 32.7%
## GCV = 0.064255 Scale est. = 0.063341 n = 1151
In a next step, the smoothers are plotted for the best model (gam1)
par(mfrow = c(2,1))
plot(gam1, select = 1, main = "Day")
plot(gam1, select = 2, main = "Night")
Check model performance. There seems to be a trend in de residuals.
par(mfrow = c(2, 2))
gam.check(gam1)
##
## Method: GCV Optimizer: magic
## Smoothing parameter selection converged after 7 iterations.
## The RMS GCV score gradient at convergence was 2.439333e-07 .
## The Hessian was positive definite.
## Model rank = 20 / 20
##
## Basis dimension (k) checking results. Low p-value (k-index<1) may
## indicate that k is too low, especially if edf is close to k'.
##
## k' edf k-index p-value
## s(direction_x):night_dayday 9.00 7.92 0.17 <2e-16 ***
## s(direction_x):night_daynight 9.00 6.45 0.17 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The model fit is reduced substantially.
subset_1eel_6days$residuals <- residuals(gam1, type="response")
subset_1eel_6days$predicted <- predict.bam(gam1)
subset_1eel_6days$fitted <- fitted(gam1, type="response")
plot <- ggplot(subset_1eel_6days, aes(x=numericdate, y=rel_depth)) +
geom_line(alpha = 0.2) +
geom_line(subset_1eel_6days, mapping = aes(x=numericdate, y=fitted), colour = "red")
plot
Plot of the residuals vs covariable ‘direction_x’
plot <- ggplot(subset_1eel_6days, aes(x=direction_x, y=residuals)) +
geom_point(alpha = 0.6)
plot
E1 <- resid(gam1, type = "pearson")
F1 <- fitted(gam1, type = "response")
par(mfrow = c(2,2))
plot(x = F1, y = E1, xlab = "Fitted values", ylab = "Pearson residuals")
plot(x = subset_1eel_6days$direction_x, y = E1, xlab = "Direction x", ylab = "Pearson residuals")
abline(h = 0, lty = 2)
plot(E1 ~ night_day, data = subset_1eel_6days, ylab = "Pearson residuals", xlab = "Day Night")
hist(E1, xlab = "Pearson residuals", main = "")
Check overdispersion
E1 <- resid(gam1, type = "pearson")
N <- nrow(subset_1eel_6days)
p <- length(coef(gam1))
overdispersion <- sum(E1^2)/(N-p)
overdispersion
## [1] 0.06354404